Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 1738 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 107.1 KiB |
| Average record size in memory | 63.1 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 8 |
country has constant value "0" | Constant |
df_index is highly correlated with id and 1 other fields | High correlation |
id is highly correlated with df_index and 1 other fields | High correlation |
state is highly correlated with df_index and 1 other fields | High correlation |
pollutant_min is highly correlated with pollutant_max and 1 other fields | High correlation |
pollutant_max is highly correlated with pollutant_min and 2 other fields | High correlation |
pollutant_avg is highly correlated with pollutant_min and 2 other fields | High correlation |
pollutant_id_NH3 is highly correlated with pollutant_max and 1 other fields | High correlation |
df_index is highly correlated with id and 1 other fields | High correlation |
id is highly correlated with df_index and 1 other fields | High correlation |
state is highly correlated with df_index and 1 other fields | High correlation |
pollutant_min is highly correlated with pollutant_max and 2 other fields | High correlation |
pollutant_max is highly correlated with pollutant_min and 1 other fields | High correlation |
pollutant_avg is highly correlated with pollutant_min and 1 other fields | High correlation |
pollutant_id_PM10 is highly correlated with pollutant_min | High correlation |
df_index is highly correlated with id and 1 other fields | High correlation |
id is highly correlated with df_index and 1 other fields | High correlation |
state is highly correlated with df_index and 1 other fields | High correlation |
pollutant_min is highly correlated with pollutant_max and 1 other fields | High correlation |
pollutant_max is highly correlated with pollutant_min and 1 other fields | High correlation |
pollutant_avg is highly correlated with pollutant_min and 1 other fields | High correlation |
country is highly correlated with pollutant_id_SO2 and 6 other fields | High correlation |
pollutant_id_SO2 is highly correlated with country | High correlation |
pollutant_id_NH3 is highly correlated with country | High correlation |
pollutant_id_NO2 is highly correlated with country | High correlation |
pollutant_id_PM2.5 is highly correlated with country | High correlation |
pollutant_id_CO is highly correlated with country | High correlation |
pollutant_id_OZONE is highly correlated with country | High correlation |
pollutant_id_PM10 is highly correlated with country | High correlation |
df_index is highly correlated with id and 3 other fields | High correlation |
id is highly correlated with df_index and 3 other fields | High correlation |
state is highly correlated with df_index and 3 other fields | High correlation |
city is highly correlated with df_index and 3 other fields | High correlation |
station is highly correlated with df_index and 3 other fields | High correlation |
pollutant_min is highly correlated with pollutant_max and 3 other fields | High correlation |
pollutant_max is highly correlated with pollutant_min and 4 other fields | High correlation |
pollutant_avg is highly correlated with pollutant_min and 3 other fields | High correlation |
pollutant_id_NH3 is highly correlated with pollutant_max | High correlation |
pollutant_id_PM10 is highly correlated with pollutant_min and 2 other fields | High correlation |
pollutant_id_PM2.5 is highly correlated with pollutant_min and 2 other fields | High correlation |
df_index has unique values | Unique |
id has unique values | Unique |
state has 28 (1.6%) zeros | Zeros |
Reproduction
| Analysis started | 2021-10-31 11:23:42.428467 |
|---|---|
| Analysis finished | 2021-10-31 11:24:03.408325 |
| Duration | 20.98 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
df_index
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONUNIQUE| Distinct | 1738 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 906.6317606 |
| Minimum | 0 |
|---|---|
| Maximum | 1835 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 86.85 |
| Q1 | 445.25 |
| median | 904.5 |
| Q3 | 1366.75 |
| 95-th percentile | 1728.15 |
| Maximum | 1835 |
| Range | 1835 |
| Interquartile range (IQR) | 921.5 |
Descriptive statistics
| Standard deviation | 530.7635132 |
|---|---|
| Coefficient of variation (CV) | 0.585423472 |
| Kurtosis | -1.218775098 |
| Mean | 906.6317606 |
| Median Absolute Deviation (MAD) | 461 |
| Skewness | 0.01509757116 |
| Sum | 1575726 |
| Variance | 281709.9069 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | 0.1% |
| 1215 | 1 | 0.1% |
| 1226 | 1 | 0.1% |
| 1225 | 1 | 0.1% |
| 1224 | 1 | 0.1% |
| 1223 | 1 | 0.1% |
| 1222 | 1 | 0.1% |
| 1221 | 1 | 0.1% |
| 1220 | 1 | 0.1% |
| 1219 | 1 | 0.1% |
| Other values (1728) | 1728 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 1835 | 1 | |
| 1834 | 1 | |
| 1833 | 1 | |
| 1832 | 1 | |
| 1831 | 1 | |
| 1830 | 1 | |
| 1829 | 1 | |
| 1828 | 1 | |
| 1827 | 1 | |
| 1826 | 1 |
| Distinct | 1738 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 907.6317606 |
| Minimum | 1 |
|---|---|
| Maximum | 1836 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 87.85 |
| Q1 | 446.25 |
| median | 905.5 |
| Q3 | 1367.75 |
| 95-th percentile | 1729.15 |
| Maximum | 1836 |
| Range | 1835 |
| Interquartile range (IQR) | 921.5 |
Descriptive statistics
| Standard deviation | 530.7635132 |
|---|---|
| Coefficient of variation (CV) | 0.584778471 |
| Kurtosis | -1.218775098 |
| Mean | 907.6317606 |
| Median Absolute Deviation (MAD) | 461 |
| Skewness | 0.01509757116 |
| Sum | 1577464 |
| Variance | 281709.9069 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1 | 0.1% |
| 1216 | 1 | 0.1% |
| 1227 | 1 | 0.1% |
| 1226 | 1 | 0.1% |
| 1225 | 1 | 0.1% |
| 1224 | 1 | 0.1% |
| 1223 | 1 | 0.1% |
| 1222 | 1 | 0.1% |
| 1221 | 1 | 0.1% |
| 1220 | 1 | 0.1% |
| Other values (1728) | 1728 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 1836 | 1 | |
| 1835 | 1 | |
| 1834 | 1 | |
| 1833 | 1 | |
| 1832 | 1 | |
| 1831 | 1 | |
| 1830 | 1 | |
| 1829 | 1 | |
| 1828 | 1 | |
| 1827 | 1 |
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 98.6 KiB |
| 0 |
|---|
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1738 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 1738 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 26 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.87802071 |
| Minimum | 0 |
|---|---|
| Maximum | 25 |
| Zeros | 28 |
| Zeros (%) | 1.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 6 |
| median | 11 |
| Q3 | 21 |
| 95-th percentile | 24 |
| Maximum | 25 |
| Range | 25 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 7.639101168 |
|---|---|
| Coefficient of variation (CV) | 0.5931890729 |
| Kurtosis | -1.292040648 |
| Mean | 12.87802071 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | 0.3092575676 |
| Sum | 22382 |
| Variance | 58.35586666 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=26)
| Value | Count | Frequency (%) |
| 24 | 287 | |
| 5 | 242 | |
| 7 | 195 | |
| 13 | 188 | |
| 10 | 183 | |
| 12 | 82 | 4.7% |
| 6 | 80 | 4.6% |
| 25 | 69 | 4.0% |
| 2 | 66 | 3.8% |
| 20 | 66 | 3.8% |
| Other values (16) | 280 |
| Value | Count | Frequency (%) |
| 0 | 28 | 1.6% |
| 1 | 13 | 0.7% |
| 2 | 66 | 3.8% |
| 3 | 14 | 0.8% |
| 4 | 5 | 0.3% |
| 5 | 242 | |
| 6 | 80 | 4.6% |
| 7 | 195 | |
| 8 | 4 | 0.2% |
| 9 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 25 | 69 | 4.0% |
| 24 | 287 | |
| 23 | 2 | 0.1% |
| 22 | 35 | 2.0% |
| 21 | 52 | 3.0% |
| 20 | 66 | 3.8% |
| 19 | 51 | 2.9% |
| 18 | 7 | 0.4% |
| 17 | 13 | 0.7% |
| 16 | 7 | 0.4% |
| Distinct | 142 |
|---|---|
| Distinct (%) | 8.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 63.64729574 |
| Minimum | 0 |
|---|---|
| Maximum | 141 |
| Zeros | 2 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 33 |
| median | 58 |
| Q3 | 93 |
| 95-th percentile | 131.15 |
| Maximum | 141 |
| Range | 141 |
| Interquartile range (IQR) | 60 |
Descriptive statistics
| Standard deviation | 38.62069801 |
|---|---|
| Coefficient of variation (CV) | 0.6067924419 |
| Kurtosis | -1.09428083 |
| Mean | 63.64729574 |
| Median Absolute Deviation (MAD) | 30.5 |
| Skewness | 0.2378790807 |
| Sum | 110619 |
| Variance | 1491.558315 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 33 | 242 | 13.9% |
| 93 | 112 | 6.4% |
| 17 | 56 | 3.2% |
| 28 | 42 | 2.4% |
| 83 | 41 | 2.4% |
| 77 | 37 | 2.1% |
| 107 | 35 | 2.0% |
| 58 | 35 | 2.0% |
| 2 | 35 | 2.0% |
| 1 | 34 | 2.0% |
| Other values (132) | 1069 |
| Value | Count | Frequency (%) |
| 0 | 2 | 0.1% |
| 1 | 34 | |
| 2 | 35 | |
| 3 | 3 | 0.2% |
| 4 | 7 | 0.4% |
| 5 | 7 | 0.4% |
| 6 | 7 | 0.4% |
| 7 | 7 | 0.4% |
| 8 | 7 | 0.4% |
| 9 | 7 | 0.4% |
| Value | Count | Frequency (%) |
| 141 | 7 | 0.4% |
| 140 | 7 | 0.4% |
| 139 | 7 | 0.4% |
| 138 | 7 | 0.4% |
| 137 | 7 | 0.4% |
| 136 | 7 | 0.4% |
| 135 | 27 | |
| 134 | 5 | 0.3% |
| 133 | 7 | 0.4% |
| 132 | 6 | 0.3% |
| Distinct | 280 |
|---|---|
| Distinct (%) | 16.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 139.9131185 |
| Minimum | 0 |
|---|---|
| Maximum | 280 |
| Zeros | 7 |
| Zeros (%) | 0.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 70 |
| median | 140.5 |
| Q3 | 210 |
| 95-th percentile | 267 |
| Maximum | 280 |
| Range | 280 |
| Interquartile range (IQR) | 140 |
Descriptive statistics
| Standard deviation | 80.72199348 |
|---|---|
| Coefficient of variation (CV) | 0.5769437085 |
| Kurtosis | -1.187803526 |
| Mean | 139.9131185 |
| Median Absolute Deviation (MAD) | 69.5 |
| Skewness | 0.005775083739 |
| Sum | 243169 |
| Variance | 6516.040231 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 215 | 7 | 0.4% |
| 131 | 7 | 0.4% |
| 19 | 7 | 0.4% |
| 36 | 7 | 0.4% |
| 39 | 7 | 0.4% |
| 53 | 7 | 0.4% |
| 109 | 7 | 0.4% |
| 113 | 7 | 0.4% |
| 140 | 7 | 0.4% |
| 147 | 7 | 0.4% |
| Other values (270) | 1668 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 1 | 7 | |
| 2 | 6 | |
| 3 | 7 | |
| 4 | 5 | |
| 5 | 7 | |
| 6 | 5 | |
| 7 | 6 | |
| 8 | 3 | |
| 9 | 7 |
| Value | Count | Frequency (%) |
| 280 | 7 | |
| 279 | 7 | |
| 278 | 6 | |
| 277 | 6 | |
| 276 | 6 | |
| 275 | 6 | |
| 274 | 7 | |
| 273 | 6 | |
| 272 | 2 | 0.1% |
| 271 | 7 |
| Distinct | 149 |
|---|---|
| Distinct (%) | 8.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28.41426928 |
| Minimum | 1 |
|---|---|
| Maximum | 217 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 5 |
| median | 14 |
| Q3 | 39 |
| 95-th percentile | 107.15 |
| Maximum | 217 |
| Range | 216 |
| Interquartile range (IQR) | 34 |
Descriptive statistics
| Standard deviation | 34.40381054 |
|---|---|
| Coefficient of variation (CV) | 1.210793429 |
| Kurtosis | 3.413806682 |
| Mean | 28.41426928 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 1.851160344 |
| Sum | 49384 |
| Variance | 1183.62218 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 148 | 8.5% |
| 2 | 116 | 6.7% |
| 4 | 86 | 4.9% |
| 3 | 82 | 4.7% |
| 5 | 70 | 4.0% |
| 6 | 67 | 3.9% |
| 8 | 56 | 3.2% |
| 10 | 44 | 2.5% |
| 7 | 42 | 2.4% |
| 9 | 35 | 2.0% |
| Other values (139) | 992 |
| Value | Count | Frequency (%) |
| 1 | 148 | |
| 2 | 116 | |
| 3 | 82 | |
| 4 | 86 | |
| 5 | 70 | |
| 6 | 67 | |
| 7 | 42 | 2.4% |
| 8 | 56 | 3.2% |
| 9 | 35 | 2.0% |
| 10 | 44 | 2.5% |
| Value | Count | Frequency (%) |
| 217 | 1 | |
| 200 | 1 | |
| 193 | 1 | |
| 184 | 1 | |
| 182 | 1 | |
| 175 | 1 | |
| 172 | 1 | |
| 161 | 1 | |
| 155 | 2 | |
| 153 | 1 |
| Distinct | 340 |
|---|---|
| Distinct (%) | 19.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 96.87341772 |
| Minimum | 1 |
|---|---|
| Maximum | 500 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 21 |
| median | 63 |
| Q3 | 124 |
| 95-th percentile | 335 |
| Maximum | 500 |
| Range | 499 |
| Interquartile range (IQR) | 103 |
Descriptive statistics
| Standard deviation | 104.7650939 |
|---|---|
| Coefficient of variation (CV) | 1.081463794 |
| Kurtosis | 2.488366121 |
| Mean | 96.87341772 |
| Median Absolute Deviation (MAD) | 48 |
| Skewness | 1.68591993 |
| Sum | 168366 |
| Variance | 10975.7249 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6 | 47 | 2.7% |
| 8 | 32 | 1.8% |
| 3 | 29 | 1.7% |
| 2 | 29 | 1.7% |
| 10 | 28 | 1.6% |
| 9 | 26 | 1.5% |
| 7 | 25 | 1.4% |
| 5 | 25 | 1.4% |
| 4 | 25 | 1.4% |
| 11 | 23 | 1.3% |
| Other values (330) | 1449 |
| Value | Count | Frequency (%) |
| 1 | 13 | 0.7% |
| 2 | 29 | |
| 3 | 29 | |
| 4 | 25 | |
| 5 | 25 | |
| 6 | 47 | |
| 7 | 25 | |
| 8 | 32 | |
| 9 | 26 | |
| 10 | 28 |
| Value | Count | Frequency (%) |
| 500 | 8 | |
| 495 | 1 | 0.1% |
| 489 | 1 | 0.1% |
| 487 | 1 | 0.1% |
| 477 | 1 | 0.1% |
| 474 | 1 | 0.1% |
| 473 | 1 | 0.1% |
| 470 | 1 | 0.1% |
| 469 | 1 | 0.1% |
| 467 | 1 | 0.1% |
| Distinct | 237 |
|---|---|
| Distinct (%) | 13.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 54.10069045 |
| Minimum | 1 |
|---|---|
| Maximum | 314 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 12 |
| median | 31 |
| Q3 | 70 |
| 95-th percentile | 194.15 |
| Maximum | 314 |
| Range | 313 |
| Interquartile range (IQR) | 58 |
Descriptive statistics
| Standard deviation | 60.82415825 |
|---|---|
| Coefficient of variation (CV) | 1.124276931 |
| Kurtosis | 2.54020862 |
| Mean | 54.10069045 |
| Median Absolute Deviation (MAD) | 23 |
| Skewness | 1.72497588 |
| Sum | 94027 |
| Variance | 3699.578226 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 5 | 52 | 3.0% |
| 4 | 50 | 2.9% |
| 2 | 46 | 2.6% |
| 6 | 46 | 2.6% |
| 12 | 40 | 2.3% |
| 3 | 39 | 2.2% |
| 7 | 38 | 2.2% |
| 10 | 35 | 2.0% |
| 8 | 34 | 2.0% |
| 1 | 30 | 1.7% |
| Other values (227) | 1328 |
| Value | Count | Frequency (%) |
| 1 | 30 | |
| 2 | 46 | |
| 3 | 39 | |
| 4 | 50 | |
| 5 | 52 | |
| 6 | 46 | |
| 7 | 38 | |
| 8 | 34 | |
| 9 | 30 | |
| 10 | 35 |
| Value | Count | Frequency (%) |
| 314 | 1 | |
| 309 | 1 | |
| 308 | 1 | |
| 297 | 1 | |
| 293 | 1 | |
| 290 | 1 | |
| 289 | 1 | |
| 284 | 1 | |
| 282 | 1 | |
| 280 | 2 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 98.6 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1469 | |
| 1 | 269 | 15.5% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 1469 | |
| 1 | 269 | 15.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 98.6 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1520 | |
| 1 | 218 | 12.5% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 1520 | |
| 1 | 218 | 12.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 98.6 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1487 | |
| 1 | 251 | 14.4% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 1487 | |
| 1 | 251 | 14.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 98.6 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1481 | |
| 1 | 257 | 14.8% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 1481 | |
| 1 | 257 | 14.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 98.6 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1493 | |
| 1 | 245 | 14.1% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 1493 | |
| 1 | 245 | 14.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 98.6 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1486 | |
| 1 | 252 | 14.5% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 1486 | |
| 1 | 252 | 14.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 98.6 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1492 | |
| 1 | 246 | 14.2% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 1492 | |
| 1 | 246 | 14.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | id | country | state | city | station | pollutant_min | pollutant_max | pollutant_avg | pollutant_id_CO | pollutant_id_NH3 | pollutant_id_NO2 | pollutant_id_OZONE | pollutant_id_PM10 | pollutant_id_PM2.5 | pollutant_id_SO2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 0 | 0 | 6 | 215 | 69.0 | 109.0 | 86.0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 1 | 2 | 0 | 0 | 6 | 215 | 82.0 | 138.0 | 105.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 2 | 2 | 3 | 0 | 0 | 6 | 215 | 10.0 | 42.0 | 19.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 3 | 3 | 4 | 0 | 0 | 6 | 215 | 4.0 | 5.0 | 4.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 4 | 4 | 5 | 0 | 0 | 6 | 215 | 16.0 | 42.0 | 27.0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 5 | 5 | 6 | 0 | 0 | 6 | 215 | 15.0 | 45.0 | 32.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 6 | 7 | 0 | 0 | 6 | 215 | 4.0 | 82.0 | 42.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 7 | 7 | 8 | 0 | 0 | 113 | 3 | 47.0 | 111.0 | 71.0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 8 | 8 | 9 | 0 | 0 | 113 | 3 | 49.0 | 120.0 | 86.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 9 | 9 | 10 | 0 | 0 | 113 | 3 | 11.0 | 44.0 | 23.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
Last rows
| df_index | id | country | state | city | station | pollutant_min | pollutant_max | pollutant_avg | pollutant_id_CO | pollutant_id_NH3 | pollutant_id_NO2 | pollutant_id_OZONE | pollutant_id_PM10 | pollutant_id_PM2.5 | pollutant_id_SO2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1728 | 1826 | 1827 | 0 | 25 | 77 | 196 | 2.0 | 15.0 | 4.0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1729 | 1827 | 1828 | 0 | 25 | 77 | 196 | 31.0 | 85.0 | 39.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1730 | 1828 | 1829 | 0 | 25 | 77 | 196 | 6.0 | 76.0 | 31.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 1731 | 1829 | 1830 | 0 | 25 | 77 | 269 | 28.0 | 75.0 | 54.0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1732 | 1830 | 1831 | 0 | 25 | 77 | 269 | 36.0 | 101.0 | 74.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 1733 | 1831 | 1832 | 0 | 25 | 77 | 269 | 10.0 | 22.0 | 15.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 1734 | 1832 | 1833 | 0 | 25 | 77 | 269 | 1.0 | 3.0 | 2.0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1735 | 1833 | 1834 | 0 | 25 | 77 | 269 | 6.0 | 28.0 | 10.0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1736 | 1834 | 1835 | 0 | 25 | 77 | 269 | 34.0 | 92.0 | 41.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1737 | 1835 | 1836 | 0 | 25 | 77 | 269 | 10.0 | 116.0 | 43.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |